flowchart TB
collect("Collect Data")
align("Align to Darwin Core")
create("Create a Darwin Core Archive (DwC-A)")
register("Register DwC-A with OBIS/GBIF")
collect-->align-->create-->register
Resources for the OBIS/GBIF enthusiast
Example DNA-Derived Datasets
| Title | Platform | Link |
| eDNA from Gulf of Mexico Ecosystems and Carbon Cruise 2021 (GOMECC-4) | OBIS | https://obis.org/dataset/210efc7c-4762-47ee-b4b5-22a0f436ef44 |
| GBIF | https://doi.org/10.15468/sm6fpz | |
| 18S Monterey Bay Time Series: an eDNA data set from Monterey Bay, California, including years 2006, 2013 - 2016 | OBIS | https://obis.org/dataset/62b97724-da17-4ca7-9b26-b2a22aeaab51 |
| GBIF | https://doi.org/10.15468/84ntea | |
| COI data from: Environmental DNA metabarcoding differentiates between micro-habitats within the rocky intertidal (Shea & Boehm, 2024) | OBIS | https://obis.org/dataset/54bc0e9c-e857-4216-a6ce-46cd6ae58cd7 |
| GBIF | https://doi.org/10.15468/33artc | |
| eDNA observations, concurrent with trawl survey, of marine fish in coastal New Jersey, USA 2019 | OBIS | https://obis.org/dataset/fe2ed263-2b21-47d7-a79f-f9b911132398 |
| GBIF | https://doi.org/10.15468/zsrtyb | |
| Kenai National Wildlife Refuge Aquatic Invasive Fish Surveys | GBIF | https://doi.org/10.15468/kzw2j5 |
GBIF and OBIS are international initiatives that aim to provide open access to biodiversity occurrence data.
Although they are most well-known for their publishing platforms, and their role as data aggregators…
…both initiatives represent an investment from the international political community and a vibrant scientific community of nodes, publishers, and users of standards and practices.
Occurrence Records: 2,688,761,528
Datasets: 104,088
Publishers: 2,720
Papers: 10,397
Occurrence Records: 126,818,624
Datasets: 4,852
Taxa: 149,319
Map of the GBIF and OBIS networks. Countries where there is at least one GBIF node are in blue. Yellow points represent OBIS nodes.
USGS-SAS Biogeographic Science Branch manages the US nodes: GBIF-US and OBIS-USA.
The US has been a voting participant in GBIF since 2001 and became an OBIS-USA node in 2005.
The USGS is shepherding over two decades of investment in the robust plumbing needed for national and global biodiversity science and data.
Both are active and energetic communities that develop, maintain, and promote standards and practices for biodiversity science and have been doing so for > 20 years
Both promote open data and science, and publish massive amounts of standardized biological observations through their platforms
Have signed a letter of agreement in recognition that their respective communities will benefit from more streamlined ways of working together. A stronger action plan is forthcoming.
GBIF and OBIS are built on the same data and metadata standards, and share open-source tools
GBIF has a global scope. OBIS includes marine occurrences only.
GBIF is funded by individual participant countries. We fund GBIF-US through the NSF. OBIS is funded by UNESCO, an agency of the United Nations.
GBIF-US and OBIS-USA are unusual in that both nodes are managed by the same people.
This is an illustration of the GBIF workflow. The OBIS workflow is similar.
Off-the-shelf platforms for publishing and tracking use of biological occurrence data with minimal financial or staffing investment. USGS, and SpringerNature, accept GBIF as an archival repository, publishing to OBIS-USA satisfies NOAA archiving requirements.
Both organizations rely on regular international investment for funding.
OBIS recently transitioned from a project to a program, which means it now receives core UNESCO/IOC funding and staff support to enable the activity to operate on a permanent basis.
Open standards and science means that migration of data would be relatively simple in the event of a surprise shutdown.
GBIF-US is funded by NSF (~ $600k / year). There is a US Delegation that guides GBIF for US interests.
OBIS-USA is funded through US participation in UN and UNESCO.
For operation of both nodes, USGS funds the node staff salary + travel. NOAA IOOS also contributes staff.
Both IPTs could easily transition to other management and continue to operate without US funding
Both secretariats have funding and training in place for new nodes and node managers
Taxonomic alignment
Facilitating round tripping of data, taxonomic updates
Sequence searches
eDNA publishing tools for non-technical / small-to-medium publishers
Expansion of US node staff
flowchart TB
collect("Collect Data")
align("Align to Darwin Core")
create("Create a Darwin Core Archive (DwC-A)")
register("Register DwC-A with OBIS/GBIF")
collect-->align-->create-->register
https://docs.gbif.org/publishing-dna-derived-data/img/web/sampling-processes.en.svg
The DNADerivedData extension extends occurrence data with molecular biology metadata.
It mostly consists of terms from:
Guidance is available: Publishing DNA-derived data through biodiversity data platforms. The guide handles barcoding, metabarcoding, metagenomics and qPCR/ddPCR.
| Term | Example |
|---|---|
| occurrenceID | GOMECC4_PANAMACITY_Sta21_DCM_A_occ7c8b2f5e16137114160dfd4001f67550 |
| eventDate | 2021-09-20T18:04-04:00 |
| locality | USA: Gulf of Mexico |
| locationID | PANAMACITY_Sta21 |
| decimalLatitude | 29.206 |
| decimalLongitude | -85.647 |
| geodeticDatum | WGS84 |
| minimumDepthInMeters | 39 |
| maximumDepthInMeters | 39 |
| Term | Example |
|---|---|
| occurrenceID | GOMECC4_PANAMACITY_Sta21_DCM_A_occ7c8b2f5e16137114160dfd4001f67550 |
| basisOfRecord | MaterialSample |
| organismQuantity | 35 |
| organismQuantityType | DNA sequence reads |
| sampleSizeValue | 12436 |
| sampleSizeUnit | DNA sequence reads |
| associatedSequences | https://www.ncbi.nlm.nih.gov/sra/SRR26161072 | https://www.ncbi.nlm.nih.gov/biosample/SAMN37516159 | https://www.ncbi.nlm.nih.gov/bioproject/PRJNA887898 |
| identificationRemarks | Tourmaline; qiime2-2021.2; naive-bayes classifier, confidence (at lowest specified taxon): 0.966508279, against reference database: PR2 v5.0.1; V9 1391f-1510r region; 10.5281/zenodo.8392706. The PR2 database used for taxonomic assignment is primarily curated for protists, and may not accurately resolve metazoa, land plants or macrosporic fungi to lower taxonomic levels. |
| verbatimIdentification | Karenia brevis |
| scientificName | Karenia brevis |
| scientificNameID | urn:lsid:marinespecies.org:taxname:233015 |
| Term | Example |
|---|---|
| occurrenceID | GOMECC4_YUCATAN_Sta100_Surface_B_occ1f111363da96fee3d180ddb12741d4ce |
| basisOfRecord | MaterialSample |
| organismQuantity | 18 |
| organismQuantityType | DNA sequence reads |
| sampleSizeValue | 12151 |
| sampleSizeUnit | DNA sequence reads |
| associatedSequences | https://www.ncbi.nlm.nih.gov/sra/SRR26160967 | https://www.ncbi.nlm.nih.gov/biosample/SAMN37516435 | https://www.ncbi.nlm.nih.gov/bioproject/PRJNA887898 |
| identificationRemarks | Tourmaline; qiime2-2021.2; naive-bayes classifier, confidence (at lowest specified taxon): 0.960018364, against reference database: PR2 v5.0.1; V9 1391f-1510r region; 10.5281/zenodo.8392706. The PR2 database used for taxonomic assignment is primarily curated for protists, and may not accurately resolve metazoa, land plants or macrosporic fungi to lower taxonomic levels. |
| verbatimIdentification | Unassigned |
| scientificName | Biota incertae sedis |
| scientificNameID | urn:lsid:marinespecies.org:taxname:12 |
| Term | Example |
|---|---|
| occurrenceID | GOMECC4_PANAMACITY_Sta21_DCM_A_occ7c8b2f5e16137114160dfd4001f67550 |
| DNA_sequence | GCTCCTACCGATTGAGTGATCCGGTGAATAATTCGGACTGCCGCAGTGTTCAGATCCTGAACGTTGCAGTGGAAAGTTTAGTGAACCTTATCACTTAGAGGAAGGAGAAGTCGTAACAAGGTTTCC |
| concentration | 1.177 |
| target_gene | 18S rRNA |
| target_subfragment | V9 |
| pcr_primer_forward | GTACACACCGCCCGTC |
| pcr_primer_reverse | TGATCCTTCTGCAGGTTCACCTAC |
| pcr_primer_name_forward | 1391f |
| pcr_primer_name_reverse | EukBr |
| pcr_primer_reference | 10.1371/journal.pone.0006372 |
| seq_meth | Illumina MiSeq 2x250 |
| otu_class_appr | Tourmaline; qiime2-2021.2; dada2; ASV |
| otu_db | PR2 v5.0.1; V9 1391f-1510r region; 10.5281/zenodo.8392706 |
| otu_seq_comp_appr | Tourmaline; qiime2-2021.2; naive-bayes classifier |
You don’t have to use OBIS/GBIF.
Off-the-shelf platforms for publishing and tracking use of biological occurrence data with minimal financial or staffing investment.
Compliments? Complaints? Ideas?
presented as a NOAA Omics Seminar 2024-04-17